Bayesian estimation of probabilities of default 
for low default portfolios 



Dirk Tasche* 

First version: December 23, 2011 
This version: April 5, 2012 

The estimation of probabilities of default (PDs) for low default portfolios by means 
of upper confidence bounds is a well established procedure in many financial institu- 
tions. However, there are often discussions within the institutions or between insti- 
tutions and supervisors about which confidence level to use for the estimation. The 
Bayesian estimator for the PD based on the uninformed, uniform prior distribution 
is an obvious alternative that avoids the choice of a confidence level. In this paper, 
we demonstrate that in the case of independent default events the upper confidence 
bounds can be represented as quantiles of a Bayesian posterior distribution based 
on a prior that is slightly more conservative than the uninformed prior. We then 
describe how to implement the uninformed and conservative Bayesian estimators in 
the dependent one- and multi-period default data cases and compare their estimates 
to the upper confidence bound estimates. The comparison leads us to suggest a con- 
strained version of the uninformed (neutral) Bayesian estimator as an alternative to 
the upper confidence bound estimators. 

Keywords: Low default portfolio, probability of default, upper confidence bound, 
Bayesian estimator. 

1. Introduction 

The probability of default (PD) per borrower is a core input to modern credit risk modelling 
and managing techniques. As such, the appropriateness of the PD estimations determines the 
quality of the results of credit risk models. Despite the many defaults observed in the recent 
financial crisis, one of the obstacles connected with PD estimates can be the low number of 
defaults in the estimation sample because one might experience many years without any default 
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for good rating grades. Even if some defaults occur in a given year, the observed default rates 
might exhibit a high degree of volatility over time. But even entire portfolios with low or no 
defaults are not uncommon in practice. Examples include portfolios with an overall good quality 
of borrowers (for example, sovereign or financial institutions portfolios) as well as high exposure 
but low borrower number portfolios (for example, specialised lending) and emerging markets 
portfolios of up to medium size. 

The Basel Committee might have had in mind these issues when they wrote "In general, es- 
timates of PDs, LGDs, and EADs are likely to involve unpredictable errors. In order to avoid 
over-optimism, a bank must add to its estimates a margin of conservatism that is related to the 
likely range of errors. Where methods and data are less satisfactory and the likely range of errors 
is larger, the margin of conservatism must be larger" (BCBS, 2006, part 2, paragraph 451). 

Pluto and Tasche (2005) suggested an approach to specify the required margin of conservatism 
for PD estimates. This method is based on the use of upper confidence bounds and the so-called 
most prudent estimation approach. Methods for building a rating system or a score function 
on a low default portfolio were proposed by a number of authors. See Erlenmaier (2006) for 
the 'rating predictor' approach and Kennedy et al. (2011) and Fernandes and Rocha (2011) for 
discussions of further alternative approaches. 

Although the Pluto and Tasche approach to PD estimation was criticised for delivering too 
conservative results (Kiefer, 2007), it seems to be applied widely by practitioners nonetheless. 
Interest in the approach might have been stimulated to some extent by the UK FSA's require- 
ment "A firm must use a statistical technique to derive the distribution of defaults implied by 
the firm's experience, estimating PDs (the 'statistical PD') from the upper bound of a confidence 
interval set by the firm in order to produce conservative estimates of PDs . . ." (BIPRU, 2011, 
4.3.95 R (2)). The Pluto and Tasche approach is also criticised for the subjectivity it involves 
as in the simplest version of the approach three parameters have to be pre-defined in order to 
be able to come up with a PD estimate. 

However, Pluto and Tasche (2011) suggested an approach to the estimation of the two correlation 
parameters that works reasonably when there is a not too short time-series of default data and 
some defaults were recorded in the past. This paper is about how to get rid of the need to choose 
a confidence level for the low default PD estimation. 

Some authors proposed modifications of the Pluto and Tasche approach in order to facilitate 
its application and to better control its inherent conservatism (Forrest, 2005; Benjamin et al., 
2006). Other researchers looked for alternative approaches to statistically based low default PD 
estimation. Bayesian methods seem to be most promising. Kiefer (2009, 2010, 2011) explored 
in some detail the Bayesian approach with prior distributions determined by expert judgment. 
Clearly, Kiefer's approach makes the choice of a confidence level dispensable. However, this 
comes at the cost of introducing another source of subjectivity in the shape of expert judgment. 
Solutions to this problem were suggested by Dwyer (2007) and Orth (2011) who discussed the 
use of uninformed (uniform) prior distributions and empirical prior distributions respectively 
for PD estimation. 

In this paper we revisit a comment by Dwyer (2007) on a possible interpretation of the Pluto 
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and Tasche approach in Bayesian terms. We show that indeed in the independent one-period 
case the upper confidence bound estimates of PDs are equivalent to quantiles of the Bayesian 
posterior distribution of the PDs when the prior distribution is chosen appropriately conservative 
(Section 2). We use the prior distribution identified this way to define versions of the conservative 
Bayesian estimator of the PD parameter also in the one-period correlated (section 3) and multi- 
period correlated (section 4) cases. 

We compare the estimates generated with the conservative Bayesian estimator to estimates 
by means of the neutral Bayesian estimator and constrained versions of the neutral Bayesian 
estimator. It turns out that in practice the neutral and the conservative estimators do not differ 
very much. In addition, we show that the neutral estimator can be efficiently calculated in 
a constrained version (assuming that the long-run PD is not greater than 10%) because the 
constrained estimator produces results almost identical with the results of the unconstrained 
estimator. 

The Bayesian approach suggested in this paper is attractive for several reasons: 

• Its level of conservatism is reasonable. 

• It makes the often criticised subjective choice of a confidence level dispensable. 

• It is sensitive to the presence of correlation in the sense of delivering estimates comparable 
to upper confidence bound estimates at levels between 50% and 75% for low correlation 
default time series and estimates comparable to 75% and higher level upper confidence 
bounds for higher correlation default time series. 

In this paper, we consider only portfolio-wide long-run PD estimates but no per rating grade 
estimates. How to spread the portfolio- wide estimate on sub-portfolios defined by rating grades 
is discussed in Pluto and Tasche (2011, 'most prudent estimation') and in van der Burgt (2008) 
and Tasche (2009, Section 5). The method discussed in Pluto and Tasche (2011) is purely based 
on sub-portfolio sizes and can lead to hardly different counterintuitive estimates for different 
rating grades. The method proposed in van der Burgt (2008) and in Tasche (2009) requires that 
an estimate of the discriminatory power of the rating system or score function in question is 
known. 

At first glance it might seem questionable to assume that there is one single long-run PD for an 
entire portfolio while at the same time trying to estimate long-run PDs for subportfolios defined 
by rating grades. However, this assumption can be justified by taking recourse to the 'law of 
rare events' (see, e.g., Durrett, 1996, Theorem (6.1)). As a consequence of this theorem, on a 
sufficiently large portfolio and as long as the PDs are not too large, for the distribution of the 
number of default events on the portfolio it does not matter whether the PDs are heterogeneous 
or homogeneous. 
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2. One observation period, independent defaults 



Let us recall the low default PD estimation in the independent defaults, one observation period 
setting as suggested by Pluto and Tasche (2005). The idea is to use the one-sided upper confi- 
dence bound at some confidence level 7 (e.g. 7 = 50%, 7 = 75%, or 7 = 90%) as an estimator 
of the long-run PD. 

Assumption 2.1 At the beginning of the observation period (in practice often one year) there 
are n > borrowers in the portfolio. Defaults of borrowers occur independently, and all have the 
same probability of default (PD) < A < 1. At the end of the observation period < k < n 
defaults are observed among the n borrowers. 

As an example typical for low default portfolios think of Assumption 2.1 with n = 1000 and 
k = l. 

What conclusion can we draw from the observation of the number of defaults k on the value 
of the PD A? If we have a candidate value (an estimate) Ao for A we can statistically test the 
(Null-)hypothesis Hq that A > Ao- 

Why Hq : A > Ao and not Hq : A < Ao? Because if we can reject Hq we have proven (at a usually 
relatively small type I error 1 level) that the alternative H\ : A < Ao is true and hence have found 
an upper bound for the PD A. 

It is well-known that under Assumption 2.1 the number of defaults is binomially distributed 
and that the distribution function of the number of defaults can be written in terms of the 
Beta-distribution (Casella and Berger, 2002, Section 3.2 and Exercise 2.40). 

Proposition 2.2 Under Assumption 2.1 the random number of defaults X in the observation 
period is binomially distributed with size parameter n and success probability X, i.e. we have 

X 

P[X<x] = P x [X<x] = £(?)A £ (l-A) n - £ , xe{0,l,...,n}. (2.1a) 

e=o 

The distribution function of X can be calculated as function of the parameter A as follows: 

f 1 ft (1 _ t) n ^ x ^ 1 dt 

Pa[A<x] = 1-P[F<A] = — , xe{0,l,...,n}, (2.1b) 

/ \t x (1 - i)™-*" 1 dt 

where Y is Beta- distributed 2 with shape parameters x + 1 and n — x. 

By means of Proposition 2.2 we can test Hq : A > Ao based on the observed number of defaults 
X as test statistic. If Pa P^ < k] < a for some pre-defined type I error size < a < 1 (a = 5% 
is a common choice) we can safely conclude 3 that the outcome of the test is an unlikely event 

1 Type I error: Rejection of the Null- hypothesis although it is true. 

Type II error: Acceptance of the Null-hypothesis although the alternative is true. 
2 See, e.g., page 623 of Casella and Berger (2002) for the the density and most important properties of the 

Beta-distribution . 

3 This test procedure is uniformly most powerful as a consequence of the Karlin-Rubin theorem (Casella and 
Berger, 2002, Theorem 8.3.17) because the binomial distribution has a monotone likelihood ratio. 
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under Hq and that, therefore, Hq should be rejected in favour of the alternative H\ : A < Ao- 

If we had n = 1000 borrowers in the portfolio at the beginning of the observation period and 
observed k = 1 defaults by the end of the period, testing the Null-hypothesis Hq : A > Ao = 1% 
would lead to 4 

P Ao= l%[X<l] = 0.05%. (2.2) 

Hence under Hq the lower tail probability is clearly less than any commonly accepted type I error 
size (like 1% or 5%) and thus we should reject Hq in favour of the alternative Hi : A < Ao = 1%. 

However, given that the observed default rate was k/n = 1/1000 = 0.1% a PD estimate of 1% 
seems overly conservative even if we can be quite sure that the true PD does indeed not exceed 
1% (at least as long as we believe that Assumption 2.1 is justified). 

With a view on the fact that the lower tail probability P^ 0=1 y e [X < 1] is much lower than a 
reasonable type I error size of - say - a = 5% we might want to refine the arbitrarily chosen 
upper PD bound of Ao = 1% by identifying the set of all Ao such that 

Pa„[A <!]<« = 5%. (2.3) 

Alternatively we may look for those values of Ao such that Hq : X > Xq would not have been 
rejected at a = 5% error level for k defaults observed. Technically speaking we then have to find 
the least Ao such that still Pa [A < k] > a, i.e. we want to determine 

X* = inf{0 < A < 1 : P Ao [A < k] > a}. (2.4a) 

Under Assumption 2.1, by continuity, Aq solves the equation 

k 

£(?)(A5)*(l-ASr^ = Pa S [*<£] = «• (2-4b) 

1=0 

Equation (2.1b) implies that the solution of (2.4b) is the (1 — a)-quantile of a related Beta- 
distribution: 

Aq = qi-a(Y) = min{y :P[Y<y]>l- a}, (2.4c) 

where Y is Beta-distributed with shape parameters k + 1 and n — k. If we again consider the 
case n = 1000, k = 1 and a = 5% we obtain from (2.4c) that 

Aq = 0.47%. (2.5a) 

This estimate of the PD A is much closer to the observed default rate of 0.1% but still - from a 
practitioner's point of view - very conservative. Let us see how the estimates changes when we 
choose much higher type I error sizes of 25% and 50% respectively (note that such high type I 
error levels would not be acceptable from a test-theoretic perspective). With n = 1000, k = 1 
and a = 25% we obtain 

Aq = 0.27%. (2.5b) 

Calculations for this paper were conducted by means of the statistics software R (R Development Core Team, 
2010). R-scripts for the calculation of the tables and figures are available upon request from the author. 
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The choice n = 1000, k = 1 and a = 50% gives 

Aq = 0.17%. (2.5c) 

These last two estimates appear much more appropriate for the purpose of credit pricing or im- 
pairment forecasting although we have to acknowledge that due to the independence condition 5 
of Assumption 2.1 we are clearly ignoring cross-sectional and over time correlation effects (which 
will be discussed in Sections 3 and 4). 

Before we discuss what type I error levels are appropriate for the estimation of long-run PDs 
by way of (2.4a) and solve this issue by taking recourse to Bayesian estimation methods, let us 
summarize what we have achieved so far. 

We have seen that - under Assumption 2.1 - reasonable upper bounds for the long-run PD A can 
be determined by identifying the set of estimates Ao such that the hypotheses Hq : A > Ao are 
rejected at some pre-defined type I error level a. By (2.4a) and (2.4c) this set has the shape of 
an half-infinite interval [Aq,oo). Equivalently, one could say that there is an half-infinite interval 
(— oo, Aq] of all the values of Ao such that the hypotheses Hq : A > Ao are accepted at the type I 
error level a. By the general duality theorem for statistical tests and confidence sets (Casella and 
Berger, 2002, Theorem 9.2.2) we have 'inverted' the family of type I error level a tests specified 
by (2.4a) to arrive at a one-sided confidence interval (— oo, Aq] at level 7 = 1 — a for the PD A 
which is characterised by the upper confidence bound Aq. This observation does not depend on 
any distributional assumption like Assumption 2.1. 

Proposition 2.3 For any fixed confidence level < 7 < 1, the number Aq(7) defined by (2.4a) 
with a = 1 — 7 represents an upper confidence bound 6 at level 7 for the PD A. 

Together with (2.4c) Proposition 2.3 implies the following convenient representation of the upper 
confidence bounds. 

Corollary 2.4 Under Assumption 2.1, for any fixed confidence level < 7 < 1, an upper 
confidence bound Aq(7) for the PD A at level 7 can be calculated by (2.4c) with = 1 — 7. 

By Corollary (2.4), the upper confidence bounds for A are just the 7-quantiles of a Beta- 
distribution with shape parameters k+\ and n — k. This observation makes it possible to identify 
the upper confidence bounds with Bayesian upper credible bounds 7 for a specific non-uniform 
prior distribution of A. 

5 While the independence assumption appears unrealistic in the context of long-run PD estimation, it might 
be appropriate for the estimation of loss given default (LGD) or conversion factors for exposure at default 
(EAD). This comment applies to the situation where only zero LGDs or conversion factors were historically 
observed. The low default estimation method of section 2 could then be used for estimating the probability 
of a positive realisation of an LGD or conversion factor. Combined with the conservative assumption that a 
positive realisation would be 100%, such a probability of a positive realisation would give a conservative LGD 
or conversion factor estimate. 

6 By (Casella and Berger, 2002, Theorem 9.3.5) the confidence interval (— 00, Aq] is the uniformly most accurate 
confidence interval among all one-sided confidence intervals at level 7 for A. 

7 See Casella and Berger (2002, Section 9.2.4) for a discussion of the conceptual differences between classical 
confidence sets and Bayesian credible sets. 
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Theorem 2.5 (Bayesian posterior distribution of PD) Under Assumption 2.1, assume in 
addition that the PD < A < 1 is the realisation of a random variable A with unconditional 
(prior) distribution 8 

vr((0, A]) = [ X = - log(l - A), < A < 1. (2.6a) 

Jo i- — u 

Denote by X the number of defaults observed at the end of the observation period. Then the 
conditional (posterior) distribution 9 of the PD A given X is 

x 

f z k {i-iy- k - l di 

P[A<A|A = fc] = ° , k G {0,1,..., n- 1}, (2.6b) 

/ i k (\-iy- k - x di 

o 

i.e. conditional on X = k the distribution of A is a Beta- distribution with shape parameters k + 1 
and n — k. 



Proof. By Proposition 2.2, since k < n Equation (2.6b) is the result of the following calculation: 

P[A < X,X = k] 



P[A < X\X = k] 



P[X = k] 

J F [ X = k\A = £]^§^d£ 
q 

JP[X = k\A = £] 9 ^^d£ 
o 



A 

\n—k di 



l(X)i k i\-iy 

_ q 

}(l)£X(l-£)n-k*l l 


A 

/ £ k {l-£) n - k - l d£ 

__ o 

— i 

j £ k {l-£) n - k ~ l d£ 
o 

This proves the assertion. □ 

At first glance, the prior distribution (2.6a) with the singularity in A = 1 seems heavily biased 
towards the higher potential values of A. Due to this conservative bias, it makes sense to call the 
distribution (2.6a) a conservative prior distribution. In any case, it is interesting to note that the 



8 Note that 7r is not a probability distribution as 7r((0, 1)) = oo. However, in a Bayesian context working with 
improper prior distributions is common as the prior distribution is only needed to reflect differences in the 
initial subjective presumptions on the likelihoods of the parameters to be estimated. Due to the condition 
k < n from Assumption 2.1, the posterior distribution of A turns out to be a proper probability distribution. 

9 This result is a generalization of Dwyer (2007, Appendix C). 
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density A i— > of the prior distribution (2.6a) is increasing. This is a feature the conservative 
prior has in common with the characteristic densities of spectral risk measures, a special class 
of coherent risk measures (Acerbi, 2002; Tasche, 2002). We will see below that the conservative 
shift induced by the prior distribution (2.6a) is actually quite moderate. 

By definition, in a Bayesian setting a credible upper bound of a parameter is a quantile of 
the posterior distribution of the parameter. By Corollary 2.4 and Theorem 2.5, since both the 
classical confidence bounds and the Bayesian credible bounds are quantiles of the same Beta- 
distribution, hence we can state the following result: 

Corollary 2.6 Under Assumption 2.1, if the Bayesian prior distribution of the PD A is given by 
(2.6a) then the classical one-sided upper confidence bounds at level < 7 < 1 and the Bayesian 
one-sided upper credible bound of X coincide and are determined by (2.4c) with = 1 — 7. 

Corollary 2.6 is a key result of this paper. We already knew from (2.4c) that the upper confi- 
dence bounds suggested by Pluto and Tasche (2005) as conservative estimates of the PD can 
be determined as quantiles of a Beta-distribution. However, Corollary 2.6 identifies this spe- 
cific Beta-distribution as a Bayesian posterior distribution of the PD for the conservative prior 
distribution (2.6a). 

In order to assess the extent of conservatism induced by the prior distribution (2.6a) we introduce 
a family of uniform prior distributions as described in the following proposition. 

Proposition 2.7 Under Assumption 2.1, let the Bayesian prior distribution of the PD A be 
given by the uniform distribution on the interval (0,u) for some < u < 1. Denote by X the 
number of defaults observed at the end of the observation period. Then the conditional (posterior) 
distribution of the PD given X is specified by the density f with 



where frfc+i,n-fc+i denotes the density of the Beta- distribution with shape parameters k + 1 and 
n — k + 1 and Y is a random variable with this distribution. 

Proof. The calculation for this proof is rather similar to the calculation in the proof of Theo- 
rem 2.5. Denote by A a random variable with uniform distribution on (0, u) which in the Bayesian 




u > A > 



1 > A > u, 



(2.7) 
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context is associated with the PD. Then we have for < A < 1 

P[A < X,X = k] 



P[A < X \ X = k] 



P[X = k] 

min(u,A) 

/ P[X = k\A = £]d£ 
o 

]~P[X = k\A = £]d£ 
o 

min(?i,A) 

/ {l)£ k {l-£) n - k d£ 
o 

]{l)£ k {l-£) n - k d£ 
o 

min(it,A) 

J bk+l,n-k+lW d£ 



u 

J bk+l,n~k+l(X) d£ 




21 



Equation (2.8) implies (2.7). □ 

Observe that in the special case u = 1 of Proposition 2.7 the posterior distribution of the PD is 
the Beta-distribution with shape parameters k + 1 and n — k + 1 as is well-known from textbooks 
like Casella and Berger (2002, see Example 7.2.14). 

The most natural estimator associated with a Bayesian posterior distribution is its mean. We 
determine the mean associated with the conservative prior (2.6a) in the following proposition. It 
is also of interest to consider the Bayesian estimators associated with the uniform distributions 
introduced in Proposition 2.7. In particular, the uniform distribution on (0, 1) is the natural 
uninformed (or neutral) prior for probability parameters. 



Proposition 2.8 Under Assumption 2.1, if the Bayesian prior distribution of the PD A is given 
by (2.6a) then the mean X\ of the posterior distribution is given by 

XI = *±± (2.9a) 
n + 1 

X\ is called the conservative Bayesian estimator of the PD X. If the Bayesian prior distribution 
of the PD X is given by the uniform distribution on (0, u) for some < u < 1 then the mean 
A|(u) of the posterior distribution is given by 

W N (fc + l)P[n+2,n-ife+l <tt] , 0QK v 

(ra + 2)P[Y fc+ i >n _ fc+ i < u\ 

where Y a n denotes a random variable which is Beta- distributed with paramaters a and (3. A?j(u) 
is called the (0, ^-constrained neutral Bayesian estimator of the PD X. For u = 1, we obtain 
the (unconstrained) neutral Bayesian estimator A^(l). 
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Proof. According to Theorem 2.5, the posterior distribution of the PD associated with the 
conservative prior distribution is the Beta-distribution with parameters k + 1 and n — k. As the 
mean of this Beta-distribution is ^j- this proves (2.9a). For (2.9b) we can compute 



E[A | X = k] 



j£P[X = k\A = £]d£ 
o 

JP[X = k\A = £}d£ 



]{l)£ k+l {l-£) n - k d£ 
__ o 

u 

J(l)£*(l-£y-*d£ 



u 

(k + 1) fb k+2 ^ k+1 {£)d£ 
o 

u 

(n + 2) fb k+1 , n - k+1 (£)d£ 
o 

This completes the proof. □ 

Observe that in the special case u = 1 of Proposition 2.8 the neutral Bayesian estimator is given 
by 

*3(i) = (2- 10 ) 



n 



The constrained neutral Bayesian estimator Aj^u) is differentiable with respect to u in the open 
interval (0, 1). This follows from the following easy-to-prove lemma: 



Lemma 2.9 Let h(X), h : (0, 1) — > (0,oo) be a continuous function. Then the function 

£xh(X)dX 



H(u) 

is continuously differentiable with 

H'(u) 



Jo MA) d\ 



, ^ fn(u- X)h(X)dX 
h(u)^ ; v ' > 0. 

(£HX)dx) 2 



(2.11a) 



(2.11b) 



With h(X) = u k (1 — u) n ~ k Lemma 2.9 immediately implies that that X^u) is increasing in 
u as one would intuitively expect. When comparing ^, the naive estimator of the PD under 
Assumption 2.1, to the Bayesian estimators A^ and Aj^tt), we can therefore notice the following 
inequalities: 

k k + 1 

A*, 



A* 2 (u) < 



n ^ n + 1 
k + 1 



n + 2 



A* 2 (l) < 



k + 1 



X 



* < A * 2 (l) = 

n ~ 2K ' n + 2 



n+1 

2k < n 



l) 



(2.12) 
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Table 1: Different PD estimates under Assumption 2.1 with k = 1. Upper confidence bounds 
according to Corollary 2.6. Naive estimator is ^. Conservative and neutral Bayesian 
estimators according to Proposition 2.8. 



Estimator 


n = LZo 


zoU 


OUU 


1UUU 


zUUU 


Naive 


0.8% 


0.4% 


0.2% 


0.1% 


0.05% 


50% upper confidence bound 


1.339% 


0.6704% 


0.3354% 


0.1678% 


0.0839% 


75% upper confidence bound 


2.1396% 


1.0734% 


0.5376% 


0.269% 


0.1346% 


90% upper confidence bound 


3.076% 


1.5469% 


0.7757% 


0.3884% 


0.1943% 


Neutral Bayesian on (0, 0.025) 


0.9732% 


0.7556% 


0.3983% 


0.1996% 


0.0999% 


Neutral Bayesian on (0, 0.05) 


1.5051% 


0.7935% 


0.3984% 


0.1996% 


0.0999% 


Neutral Bayesian on (0, 0.1) 


1.5745% 


0.7937% 


0.3984% 


0.1996% 


0.0999% 


Neutral Bayesian on (0,1) 


1.5748% 


0.7937% 


0.3984% 


0.1996% 


0.0999% 


Conservative Bayesian 


1.5873% 


0.7968% 


0.3992% 


0.1998% 


0.1% 



Hence, the conservative Bayesian estimator is indeed more conservative than the naive estima- 
tor and the neutral Bayesian estimators. We conclude this section with a numerical example 
(Table 1), comparing the three estimators from (2.12) and the three upper confidence bounds 
at 50%, 75%, and 90% levels. From this example, some conclusions can be drawn: 

• Under the assumption of independent defaults, the Bayesian estimators tend to assume 
values between the 50% and 75% upper confidence bounds. Hence, choosing confidence 
levels between 50% and 75% seems plausible. This conclusion will be confirmed in Section 4. 

• However, as we will see in Sections 3 and 4, example calculations for the dependent case 
indicate that then the Bayesian estimators tend to assume values between 75% and 90% 
upper confidence bounds. 

• The difference between the neutral and the conservative Bayesian estimators is relatively 
small and shrinks even more for larger n. This observation holds in general as will be 
demonstrated in Sections 3 and 4 

3. One observation period, correlated defaults 

In this section, we replace the unrealistic assumption of defaults occuring independently by the 
assumption that default correlation is caused by one factor dependence as in the Basel II credit 
risk model (BCBS, 2004). 

Assumption 3.1 At the beginning of the observation period there are n > borrowers in the 
portfolio. All defaults of borrowers have the same probability of default (PD) < A < 1. The 
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event Di 'borrower i defaults during the observation period' can be described as follows 10 : 

Di = {y/gS + ^/l--Qii < $ _1 (A)}, (3.1) 

where S and £j, i = 1, . . . , n, are independent and standard normal. S is called systematic factor, 
£i is the idiosyncratic factors relating to borrower i. The parameter < g < 1 is called asset 
correlation. At the end of the observation period < k < n defaults are observed among the n 
borrowers. 

By (3.1), in the case g > the default events are no longer independent 11 : 

P[Borrowers i and j default] = P[A H Dj] = $ 2 ($ _1 (A), $ _1 (A); g) > A 2 = P[A] P[Dj]. (3.2) 

We exclude the case g = 1 from Assumption 3.1 because it corresponds to the situation where 
there is only one borrower. 

Without independence, Proposition 2.2 does no longer apply. However, the following easy-to- 
prove modification holds. 

Proposition 3.2 Under Assumption 3.1 the random number of defaults X in the observation 
period is correlated binomially distributed with size parameter n, success probability X, and asset 
correlation parameter < g < 1. The distribution of X can be represented as follows 12 : 

P[X<k}= J V (y)^2r i )G(X,g, y y(l-G(X,g,y)) n ^dy, (3.3a) 

-00 i=0 

G(X, g, y) = $( r ' ( ^ y ) = P[£ | S = y]. (3.3b) 



(3.3c) 



The mean and the variance of X are given by 
E[X] =n\, 

var[X] = n (A - A 2 ) + n (n - 1) (<£ 2 (<i> -1 (A), <£ -1 (A); g) - A 2 ). 



P[X < k] can be efficiently calculated by numerical integration. Alternatively, one can make use 
of a representation of P[X = k] by the distribution function of an n-variate normal distribution: 

P[X = k] = P[Z 1 <$- 1 (A),...,Z fc <$- 1 (A),Z fc+1 >$- 1 (A),Z„>$- 1 (A)], (3.4) 

where (Z±, . . . , Z n ) is multi-variate normal with Zi ~ AA(0, 1), i = 1, . . . , re, and corr[Zj, Zj] = g, 
i + j- 

Figure 1 demonstrates the impact of introducing correlation as by Assumption 3.1 on the bino- 
mial distribution. The variance of the distribution is much enlarged (as can be seen from (3.3c) 
and (3.2)), and so is the likelihood of assuming large or small values at some distance from the 
mean. 



10 $ denotes the standard normal distribution function. 

11 $2 denotes the bivariate normal distribution with standardised marginals. 
12 tp denotes the standard normal density function: <p(s) — 
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Figure 1: Binomial and correlated binomial distributions with same size and sucess probability 
parameters. 
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With regard to estimators for the PD A from Assumption 3.1, Equation (2.4a) represents the 
general approach to upper confidence bound estimators, i.e. Proposition 2.3 still holds in the 
more general 'correlated' context of Assumption 3.1. Under Assumption 3.1, however, Corol- 
lary 2.4 no longer applies. Neither does apply Proposition 2.8 such that there is no easy way of 
calculating the Bayesian estimates in the case of correlated defaults. Instead the upper confi- 
dence bound estimates and the Bayesian estimates have to be calculated by numerical procedures 
involving one- and and two-dimensional numerical integration and numerical root finding (for 
the confidence bounds) - as noted in the following proposition. 



Proposition 3.3 LetP x [X = k} = f cp(y) (£) G(X, g,y) k (l-G(\, g,y)) n ~ k dy with the func- 

— oo 

tion Gr(-) being defined as in Proposition 3.2. Under Assumption 3.1, then we have the following 
estimators for the PD parameter A: 

(i) For any fixed confidence level < 7 < 1, the upper confidence bound Aq(7) for the PD A 
at level 7 can be calculated by equating the right-hand side of (3.3a) to 1 — 7 and solving 
the resulting equation for A. 
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(ii) If the B ay esian prior distribution of the PD X is defined by (2.6a) then the mean X\ of the 
posterior distribution is given by 



A i ~ .1 p x [x=k] —• ( 3 - 5a ) 



In particular, the integrals in the numerator and the denominator of the right-hand side 
of (3.5a) are finite. X\ is called the conservative Bayesian estimator of the PD A. 

(Hi) If the Bayesian prior distribution of the PD A is uniform on (0,u) for some < u < 1 
then the mean X\{u) of the posterior distribution is given by 

C XP X [X = k]dX 
X* 2 (u) = ^ — ^ ^ . 3.5b 

X^iu) is called the (0, ^-constrained neutral Bayesian estimator of the PD X. For u = 1, 
we obtain the (unconstrained) neutral Bayesian estimator A^(l). 

Proof. Only the statement that the integrals in (3.5a) are finite is not obvious. Observe that 
both for a = and a = 1 we have 



o l-A -Jo l-A 

where Y denotes a standard normal random variable. It is, however, a well-known fact that 

E M ^ 1( ^f Y )] = A - This im P lieS Jo ^^Ia^ dX < °°- D 

Since the mapping A H > Pa[^ = k] is continuous, Lemma 2.9 implies that the neutral Bayesian 
estimator A^(u) is differentiable with respect to u also under Assumption 3.1, with derivative 

= Vu[x = k] J>-A)P>[X=*MA > „ (3 6) 



Hence the neutral Bayesian estimator X^u) from Propositions 3.3 is increasing in u, as in the 
independent case. 

Table 2, when compared to Table 1, shows that the impact of correlation on the one-period PD 
estimates is huge. For larger portfolio sizes and higher confidence levels, the impact of correlation 
is stronger than for smaller portfolios and lower confidence levels. 

While, thanks to the Bayesian estimators, it is possible to get rid of the subjectivity inherent 
by the choice of a confidence level, it is not clear how to decide what should be the right level 
of correlation for the PD estimation. The values g = 0.18 and g = 0.24 used for the calculations 
for Table 2 are choices suggested by the Basel II Accord where the range of the asset correlation 
for corporates is defined as [0.12,0.24]. Hence in Table 1 we have looked at the mid-range and 
upper threshold values of the correlation but there is no convincing rationale of why these values 
should be more appropriate than others. 



14 



Table 2: One-period, correlated case for different asset correlation values. PD estimates under 
Assumption 3.1 with k = 1. Naive estimator is ^. Upper confidence bounds and neutral 
and conservative Bayesian estimators according to Proposition 3.3. 



Estimator 


n = 125 


250 


500 


1000 


2000 


Naive 


0.8% 


0.4% 


0.2% 


0.1% 


0.05% 


Q = 


See Table 1. 


q = 0.18 


50% upper confidence bound 


2.172% 


1.213% 


0.6752% 


0.3789% 


0.2101% 


75% upper confidence bound 


4.6205% 


2.7141% 


1.5935% 


0.9371% 


0.5494% 


90% upper confidence bound 


8.3234% 


5.1456% 


3.166% 


1.9408% 


1.1889% 


Neutral Bayesian on (0, 0.01) 


0.5893% 


0.5555% 


0.5146% 


0.4673% 


0.4145% 


Neutral Bayesian on (0, 0.1) 


3.747% 


2.9483% 


2.2161% 


1.6063% 


1.136% 


Neutral Bayesian on (0, 0.25) 


5.1849% 


3.6091% 


2.4817% 


1.701% 


1.1664% 


Neutral Bayesian on (0,1) 


5.3717% 


3.6534% 


2.491% 


1.7028% 


1.1669% 


Conservative Bayesian 


5.6706% 


3.8092% 


2.5724% 


1.7455% 


1.1894% 


q = 0.24 


50% upper confidence bound 


2.5847% 


1.4981% 


0.871% 


0.5069% 


0.2939% 


75% upper confidence bound 


5.7816% 


3.5573% 


2.1841% 


1.3431% 


0.8216% 


90% upper confidence bound 


10.7333% 


6.9794% 


4.5195% 


2.9129% 


1.8711% 


Neutral Bayesian on (0, 0.01) 


0.5909% 


0.5631% 


0.5312% 


0.4955% 


0.4564% 


Neutral Bayesian on (0, 0.1) 


4.1485% 


3.5018% 


2.8692% 


2.287% 


1.7805% 


Neutral Bayesian on (0, 0.25) 


6.4935% 


4.9115% 


3.6527% 


2.6923% 


1.977% 


Neutral Bayesian on (0,1) 


7.1128% 


5.1411% 


3.7339% 


2.7193% 


1.9855% 


Conservative Bayesian 


7.6721% 


5.4633% 


3.9248% 


2.8324% 


2.0527% 



In section 4, we will explore how to estimate the asset correlation, while at the same time we 
extend the range of the estimation samples to time series of default observations. Clearly, the 
assumption of having a time series of default observations for the PD estimation is more realistic 
than the one-period models we have studied so far. 

4. Multi- period observations, correlated defaults 

According to BCBS (2006, part 2, paragraph 463), banks applying the IRB approach have to 
use at least 5 years of historical default data for their PD estimations. Ideally, the time series 
would cover at least one full credit cycle. Obviously, this requirement calls for a multi-period 
approach to PD estimation. 

The portfolio characteristic of low default numbers often can be observed over many years. 
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Clearly, multiple years of low default numbers should be reflected in the PD estimates. However, 
when modelling for multi-period estimation of PDs dependencies over time must be regarded 
because the portfolio includes the same borrowers over many years and the systematic factors 
causing cross-sectional correlation of default events in different years are unlikely to be uncor- 
rected. 

In non-technical terms, the framework for the PD estimation methods described in this section 
can be explained as follows: 

• There is a time series (m, k\), . . . , (tit, hr) °f 

— annual pool sizes ni,...,nr (as at beginning of the year), and 

— annual observed numbers of defaults ki, . ■ ■ , kx (as at end of the year). 

• The pool of borrowers observed for potential default is homogeneous with regard to the 
long-run and instantaneous (point-in-time) PDs: 

— At a fixed moment in time, all borrowers in the pool have the same instantaneous 



— All borrowers have the same long-run average PD. 

• There is dependence of the borrowers' default behaviour causing cross-sectional and 
over-time default correlation: 

— At a fixed moment in time, a borrower's instantaneous PD is impacted by an idiosyn- 
cratic factor and a single systematic factor common to all borrowers. 

— The systematic factors at different moments in time are the more dependent, the less 
the time difference is. 

The following assumption provides the details of a technical framework for multi-period mod- 
elling of portfolio defaults in the presence of cross-sectional and over time dependencies that has 
the just mentioned features. 

Assumption 4.1 The estimation sample is given by a time series (n±, k±), . . . , (ut, &t) of an- 
nual pool sizes n%, . . . ,nr and annual observed numbers of defaults k±, . . . , kx with < k\ <n\, 
. . ., < kx < UT- 
AH defaults of borrowers have the same probability of default (PD) parameter < A < 1. De- 
fault events at time t are impacted by the systematic factor St which is assumed to be standard 
normally distributed. 

The systematic factors (Si, . . . , St) are jointly normally distributed. The correlation of St and 
S T decreases with increasing difference oft and r as described in Equation (4.1a): 



PD. 



corr[5 4 , S T ] 



#- T L 



(4.1a) 



Default of borrower A occurs at time t if 



(4.1b) 
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Here ^A,t is another standard normal variable, called idiosyncratic factor, independent of the 
idiosyncratic factors relating to the other borrowers and (Si, . . . , St). 

The correlation parameters < g < 1 and < i? < 1 are the same for all borrowers and pairs 
of borrowers respectively. 



The purpose of the time- correlation parameter i9 is to capture time-clustering of default obser- 
vations. By (4.1a) the correlation matrix Y,$ of the systematic factors has the following shape: 



1 

i) 



1 



i) 



0T-1 
$T-2 





if 



1 

l) 



i) 
1 



(4.2) 



Since the correlation of a pair of systematic factors falls exponentially with increasing time 
difference the dependence structure has a local, short-term character. 



As in Section 3, the parameter g is called asset correlation. It controls the sensitivity of the 
default events to the latent factors. The larger g, the stronger the dependence between different 
borrowers. 



Proposition 4.2 Under Assumption 4-1, denote by Xt the random number of defaults observed 
in yeart. Define the function G by (3.3b). 

Then the distribution of Xf is correlated binomial, as specified by (3.3a). 

A borrower's unconditional (long-run) probability of default at time t is X, i.e. 

P\[Borrower A defaults at time t] = A. (4.3a) 

A borrower's probability of default at time t conditional on a realisation of the systematic factors 
(Si, . . . , St) (point-in-time PD) is given by 

P\[Borrower A defaults at time t \ Si, . . . , St] = G(X, g, St). (4.3b) 

The probability to observe ki defaults at time 1, . . Ut defaults at time T , conditional on a 
realisation of the systematic factors (Si, . . . , St) is given by 

T 

. n t —k t 

{ k t ) ^K A i & °t) - V A ~ ^K A i &t), 
t=i 

The unconditional probability to observe ki defaults at time 1, . . ., kT defaults at time T is given 
by 



P x [Xi = ki,...,X T = k T \Si,...,S T ] = H( n k t )G(X, g,S t ) h (l-G(X, g,S t )) nt kt . (4.3c) 



P\[Xi = h, . . . ,X T = k T ] = J ■ ■ ■ J <ps#(si, ■ ■ ■ , s T ) 

t 

J] ( Z ) G(X, g, S t ) k > (1 - G(X, g, S t )) nt ~ kt d(s u ...,s T ), (4.3d) 
t=\ 

where (^s^ denotes the multi-variate normal density (see, e.g. McNeil et al, 2005, Section 3.1.3 
for the definition) with mean and covariance matrix as defined by (4.2). 
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Proof. For a fixed time t Assumption 4.1 implies Assumption 3.1. By Proposition 3.2, this 
implies that X t is correlated binomial and (4.3a). By independence of £4 and (Si, . . . , St) 
and the fact that £4 is standard normal, (4.3b) follows from (4.1b). Equation (4.3c) follows 
from the observation that the default events as specified by (4.1b) are independent conditional 
on realisations of the systematic factors (Si, . . . , St)- Equation (4.3d) is then an immediate 
consequence of the definition of conditional probability. □ 

Remark 4.3 By (4.3d), for A > 0, there is a positive - if very small - probability of observing 
ni+7i2+. • -+riT defaults during the observation period ofT years. However, in realistic portfolios 
this event would be impossible and hence have probability zero. 

This observation implies that Assumption 4-1 is not fully realistic. It is possible to make As- 
sumption 4-1 more realistic by providing exact information about the years each borrower spent 
in the portfolio and about the reasons why borrowers disappeared from the portfolio ( default or 
regular termination of the transactions with the borrower). 

The original method for multi-period low default estimation suggested by Pluto and Tasche (2005) 
is based on such a cohort approach. Pluto and Tasche (2005) actually considered only the case 
where a cohort of borrowers being in the portfolio at time 1 was observed over time, without the 
possibility to leave the portfolio regularly. In addition, Pluto and Tasche assumed that no new 
borrowers entered the portfolio. This latter assumption can be removed, but at high computational 
cost. 

In this paper, we focus on the simpler (but slightly unrealistic) approach developed on the basis 
of Assumption 4-1 and Proposition 4-2. This approach was called multiple binomial in Pluto 
and Tasche (2011) and its numerical results were compared to results calculated by means of the 
cohort approach from Pluto and Tasche (2005). Pluto and Tasche found that the differences of 
the results by the two approaches were negligible. Thus, the multiple binomial approach based on 
Assumption 4-1 can be considered a reasonable approximation to the more realistic but also more 
involved cohort approach. 

In principle, both (4.3c) and (4.3d) can serve as the basis for maximum likelihood estimation 
of the model parameters A (PD), g (asset correlation), and 1? (time correlation). Using (4.3c) 
for maximum likelihood estimation requires the identification of the systematic factors with 
real, observable economic factors that explain all the systemic risk of the default events. While 
for corporate portfolios there are promising candidates for the identification of the systematic 
factors (see Aguais et al., 2006, for an example), it is not clear whether it is indeed possible to 
explain all the systemic risk of the portfolios by the time evolution of just one observable factor. 
Moreover, there are low default portfolios like banks or public sector entities for which there are 
no obvious observable economic factors that are likely to explain most of the systemic risk of 
the portfolios. 

In the following, it is assumed that the systematic factors (Si, . . . , St) are latent (not observable) 
and that, hence, maximum likelihood estimation of the model parameters A, g, and # must be 
based on Equation (4.3d). The right-hand side of (4.3d) is then proportional to the marginal 
likelihood function that must be maximised as function of the model parameters. In technical 
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terms, the relating procedure for finding the maximum likelihood estimates A, g, and i? can be 
described as 

(A, g, &) = arg max / ••• / V^Ol, ■ • • , s T ) 
(\q,v) J J 

T 

11 G(X, g, s t ) kt (1 - G(X, g, s t )) nt ~ kt d( Sl , s T ). (4.4) 
t=i 

Solving the optimizaton problem (4.4) is demanding as it involves multi-dimensional integration 
and the determination of an absolute maximum with respect to three variables. For Example 4.5 
and Example 4.6 below, the multiple integrals were calculated by means of Monte-Carlo sim- 
ulation while the procedure nlminb from the software package R (R Development Core Team, 
2010) was applied to the optimization problem. Note that the maximum likelihood estimates of 
A, g, and # are different from only if k% + . . . + kx > (i.e. only if at least one default was 
observed) . 

Maximum likelihood estimates are best estimates in some sense but are not necessarily conser- 
vative. In particular, if there are no default observations the maximum likelihood estimate of the 
long-run PD is zero - which is unsatisfactory from the perspective of prudent risk management. 
That is why it makes sense to extend the upper confidence bound and Bayesian approaches 
from Sections 2 and 3 to the multi-period setting as described by Assumption 4.1. Bayesian es- 
timates in the context of Assumption 4.1 are straight-forward while the determination of upper 
confidence requires another approximation since convolutions of binomial distributions are not 
binomially but at best approximately Poisson distributed. 

Proposition 4.4 Under Assumption 4-1, denote by Xt the random number of defaults observed 
in year t. Let P\[X\ = k\, . . . , Xt = hr\ be given by (4.3d) and let X = X\ + . . . + Xt denote 
the total number of defaults observed in the time period from t = 1 to t = T . Define the function 
G by (3.3b) and let k = ki + . . . + &t- 

Then we have the following estimators for the PD parameter X: 

(i) For any fixed confidence level < 7 < 1, the upper confidence bound Aq(7) for the PD X 
at level 7 can be approximately calculated by solving the following equation for X: 

1-7 = Vx[X<k) 

■J ¥>£,,(si, • • • , s T ) exp(-I x , g (sx, s T )) 

k 

v-^ h, s (si,...,s T y , . . . 

2^ ^ d(s 1 ,...,s T ), (4.5a) 

T 

h, s (si,--.,s T ) = ^2n t G(X, g,s t ). 
t=i 

(ii) If the Bayesian prior distribution of the PD X is given by (2.6a) then the mean X\ of the 
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posterior distribution is given by 

jl XP x [X 1 =k 1 ,...,X T =k T ] dX 

A i = 7i P A [x 1 =fc 1 "..,x T =fc T ] ~TT ■ ( 4 - 5b ) 

Jo 1=A « A 

In particular, the integrals in the numerator and the denominator of the right-hand side 
of (4.5b) are finite. X\ is called the conservative Bayesian estimator of the PD A. 

(Hi) If the Bayesian prior distribution of the PD A is uniform on (0, u) for some < u < 1 
then the mean A|(it) of the posterior distribution is given by 

S;p\\Xi = k 1 ,...,x T = k T ]d\ 1 ' 

X^u) is called the (0, unconstrained neutral Bayesian estimator of the PD A. For u = 1, 
we obtain the (unconstrained) neutral Bayesian estimator Aj^l). 



Proof. As the Xt are independent and binomially distributed conditional on realizations of 
the systematic factors (Si, . . . , St), they are approximately Poisson distributed conditional on 
(Si, . . . , St), with intensities nt G(X, g, st), t = 1, . . . , T. Approximation (4.5a) follows because 
the sum of independent Poisson distributed variables is again Poisson distributed, with intensity 
equal to the sum of the intensities of the variables. Formulae (4.5b) and (4.5c) for the Bayesian 
estimators are straightforward. The finiteness of the integrals on the right-hand side of (4.5b) 
can be shown as in the proof of Proposition 3.3. □ 

Observe that Pa[A"i = ki,. . . ,Xt = fey] as given by (4.3d) is continuous in A. By Lemma 2.9 
this implies that u i— > A|(it) is increasing with u also under Assumption 4.1, again as is to be 
intuitively expected. 

We are going to illustrate the multi-period estimators of the correlation parameters and the PD 
A that have been presented in (4.4) and in Proposition 4.4 by two numerical examples. The first 
of the examples is for comparison with the results in Tables 1 and 2 and, therefore, is fictitious. 
The second example is based on real default data as reported in Moody's (2011). Before we 
present the examples, it is worthwhile to provide some comments of the numerical calculations 
needed for the evaluation of the estimators. 

The main difficulty in the numerical calculations for the multi-period setting is the evaluation of 
the unconditional probability (4.3a) as it requires multi-dimensional integration. For the purpose 
of this paper, we approximate the multi-variate integral by means of Monte-Carlo simulation, 
i.e. we generate a sample (s[ , . . . , ), • • ., (s^\ ■ ■ . , ) of independent realisations of the 
jointly normally distributed systematic factors (Si, . . . , St) from Assumption 4.1 and compute 

n T 

P x [Xi=ki,...,X T = k T } - J2H(lt)G(X, g ,sP)^(l-G(X, Q,s?)r- h . (4.6) 

i=i t=i 

The right-hand side of (4.5a) is similarly approximated. The estimators (4.5b) and (4.5c), how- 
ever, require an additional integration with respect to a uniformly distributed variable. With a 
view of preserving the monotonicity property of u i— > X^(u) and efficient calculation of X^(u) for 
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Table 3: Fictitious default data for Example 4.5. 



Year 


Pool size 


Defaults 


2003 


125 





2004 


125 





2005 


125 





2006 


125 





2007 


125 





2008 


125 





2009 


125 





2010 


125 


1 


All 


1000 


1 



different u we approximate the estimators \\ and X^(u) in the following specific way that might 
not be most efficient. 

For fixed < u < 1 choose a positive integer m and let 

Ui = — u, i = 0, 1, . . . , m. (4.7a) 
m 

Generate a sample (s[ , . . . , ), . . ., (sj™ , . . . , s^) of independent realisations of the jointly 
normally distributed systematic factors (Si,..., St) from Assumption 4.1, with n being an 
integer possibly different to m. Based on (no, . . . , u m ) and (s± , . . . , s^P), . . ., (s^ , . . . , Sjf*) we 
then use the below estimators of A* and X^(u): 

y „ EM^a-^Ej^IlLiG^, ,s?)^(i-G( Uu e ,s^)) nt - kt 

1 ~ Er=o(l - tn)" 1 E"=i nLi <?K g, s^ (1 - g, s?))^ ' 

A Ego^gMCig^ e ,s^Y(l-G( Ul , g,s?)p- k * 

~ E^E^ 1 IlLG(ti*, ft -? ) )**(l-G(« i ,ft-? ) )) nt -* ( ' 

The right-hand side of (4.7b) has been stated deliberately for general u < 1 although in theory 

according to (4.5b) only u = 1 is needed. The reason for this generalization is that the values 

of the functions integrated in (4.5b) and (4.5c) are very close to zero for A much greater than 
V T - k t 

r^f 1 — and, therefore, can be ignored for the purpose of evaluating the integrals. 

T, t =i n t 

Example 4.5 (Fictitious data) We apply the estimators (4.4), (4.5a), (4.5c), and (4.5b) to 
the fictitious default data time series presented in Table 3. The output generated by the calculation 
with an R- script is listed in Appendix A. 

Example 4.6 (Real data) We apply the estimators (4.4), (4.5a), (4.5c), and (4.5b) to the 
default data time series presented in Table 4 in order to determine a long-run PD estimate for 
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entities rated as investment grade (grades Aaa, Aa, A, and Baa) by the rating agency Moody's. 
The output generated by the calculation with an R-script is listed in Appendix B. 

Comments on the computation characteristics and results shown in Appendices A and B: 

• The calculation output documentated in both Appendices starts with some characteristics 
of the Monte Carlo simulations used in the course of the calculations. The computations 
for the two maximum likelihood (ML) estimators (for the three parameters A, p, and $ 
together and for A alone, with pre-defined values of p and $) are based on 16 runs of 
10,000 iterations, effectively producing estimates each based on 160,000 iterations. Sim- 
ilarly, the computations for the upper confidence bounds are each based on 16 runs of 
10,000 iterations. 

• Sixteen Monte Carlo runs were also used for the Bayesian estimators. However, as the 
Bayesian estimation according to Proposition 4.4 requires inner integration for the uncon- 
ditional probabilities and outer integration with respect to A the documented calculation 
output lists both the number of simulation iterations (n in (4.5b) and (4.5c)) for the inner 
integral and the number of steps (m in (4.5b) and (4.5c)) in the outer integral. 

• The split into 16 runs was implemented in order to deliver rough estimates of the estimation 
uncertainty inherent in the Monte Carlo simulation. The standard deviations shown in 
Appendices A and B below the different estimates are effectively the standard deviations 
of the means of the 16 runs each with 10,000 iterations. Hence, the standard deviation of 
a single run of 10,000 iterations can be determined by multiplying the tabulated standard 
deviations with 4 = \/l6- 

• Below the Monte Carlo characteristics, summary metrics of the default data from Tables 3 
and 4 respectively are shown. The naive PD estimates are calculated as number of observed 
defaults divided by nmber of obligor-years. 

• The maximum likelihood estimates listed in the appendices were determined by solving the 
optimisation problem (4.4) (case of estimated correlations) and the related optimisation 
problem for the PD A only (case of pre-defined correlations). The calculations for the upper 
confidence bounds and the Bayesian estimators were based on the formulae presented in 
Proposition 4.4. In addition, Monte Carlo approximations according to (4.6), (4.7b) and 
(4.7c) were used. 

• In both cases (estimated correlations and pre-defined correlations respectively) the (uncon- 
strained) neutral and conservative Bayesian estimates were approximated by the (0,0.1)- 
constrained estimates (i.e. u = 0.1 in (4.7b) and (4.7c)). Test calculations not documented 
in this paper showed that there is practically no difference between these constrained es- 
timates and the unconstrained estimates (with u = 1) as long as the naive estimates are 
of a magnitude of not more than a few basis points. 

• The constrained neutral Bayesian estimates were calculated with the constraint u given 
by the corresponding 99%-upper confidence bounds of the long-run PD parameter A. 

Some observations on the estimation results for Examples 4.5 and 4.6 as presented in Appen- 
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dices A and B: 



• The multi-period case is situated between the independent and correlated one-period cases, 
in the sense of exhibiting heavier tails of the default number distribution than the indepen- 
dent one-period case and lighter tails of the default number distribution than the correlated 
one-period case. This follows from a comparison of the upper confidence bound results in 
Appendix A to the n = 1000 columns in Tables 1 and 2. In general, the heavier the tail of 
the default number distribution is as a consequence of default correlation, the stronger is 
the growth of the upper confidence bounds with increasing confidence level. 

• Example 4.5, case "estimated correlations", and Example 4.6, both cases of estimated and 
pre-defined correlations, are qualitatively closer to the one-period independent case while 
Example 4.5 "pre-defined correlations" is qualitatively closer to the one-period correlated 
case. 

• In Example 4.5, case "estimated correlations", the estimated asset correlation is zero. 
Therefore, the case "estimated correlations" of Example 4.5 is indeed equivalent to the 
independent one-period case (cf. results for upper bounds and unconstrained neutral 
Bayesian estimate in Appendix A and the column for portfolio size n = 1000 in Table 1). 

• Due to the relatively long time span of 21 years covered by the default data time series 
in Example 4.6, the average of the elements in the time correlation matrix (4.2) is quite 
small. As the portfolio sizes in any fixed year are small compared to the number of all 
observations in the time series, therefore, the tail of the default number distribution is 
relatively light, as in the independent one-period case. This is, in particular, indicated in 
the results shown in Appendix B by the fact that the Bayesian estimates for all three cases 
(unconstrained neutral, constrained neutral, conservative) are practically identical. 

• In the case "pre-defined correlations" of Example 4.5 the combination of a relatively short 
time series with significant asset correlation and time correlation triggers a relatively heavy- 
tailed default number distribution. One consequence of this are the significant differences 
between the three different Bayesian estimates presented in Appendix A. This behaviour 
is more similar to the behaviour of the Bayesian estimators in the one-period correlated 
case (see Table 2) than to the behaviour of these estimators in the one-period independent 
case (see Table 1). 

• The conservative Bayesian estimates are for small portfolio sizes and higher default corre- 
lation significantly greater than the neutral Bayesian estimates while in the case of inde- 
pendent defaults there is hardly any difference between the conservative and the neutral 
estimates. 

• Undocumented observation: Due to the portfolio size and the length of the time series, 
Example B is close to the limits of what can still be dealt with by means of the numerical 
procedures described in this paper 13 . 

The observations on Example 4.5 and 4.6 suggest that the neutral Bayesian estimator (applied 



See Wilde and Jackson (2006) for an alternative, less computationally intensive approach to the calculations. 
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Table 4: Default data (Source: Moody's, 2011, Exhibits 17 and 42) for Example 4.6. The table 
lists the numbers of entities rated as investment grade (grades Aaa, Aa, A, and Baa) by 
Moody's at the beginning of the year and the numbers of defaults among these entities 
observed by year end. 



Year 


Pool size 


Defaults 


1990 


1492 





1991 


1543 


1 


1992 


1624 





1993 


1731 





1994 


1888 





1995 


2012 





1996 


2209 





1997 


2412 





1998 


2593 


1 


1999 


2742 


1 


ZUUU 


onnQ 
zyUo 


4 


9001 




A 


2002 


3128 


14 


2003 


3015 





2004 


2977 





2005 


3025 


2 


2006 


3082 





2007 


3108 





2008 


3133 


14 


2009 


3048 


11 


2010 


2966 


2 


All 


53630 


54 



as (0, 0.1)-constrained estimator) gives appropriately conservative estimates of the long-run PD 
parameter (not only in the very low default case). This estimator generates estimates between 
the 50% and 75% upper confidence bounds in the less correlated cases (short time series with low 
asset correlation or longer time series) and estimates between the 75% and 90% upper confidence 
bounds in the more correlated cases. This way, the neutral Bayesian estimator is more sensitive 
to the presence of correlation in the data than the upper confidence bound estimators. The 
conservative Bayesian estimator has a similar property of being sensitive to correlation but 
appears not to differ significantly from the neutral estimator even in the presence of correlation. 
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5. Conclusions 



In this paper, we have revisited the approach for the estimation of PDs for low default portfolio as 
suggested in Pluto and Tasche (2005). For the one-period case with independent default events, 
we have shown that the upper confidence bounds for the PD can be calculated as quantiles of the 
Bayesian posterior distribution for a simple prior that is more conservative than the uninformed 
neutral prior. This observation suggests that Bayesian estimators computed as means of the 
posterior distributions can serve as an alternative to the upper confidence bounds approach. 
Such an alternative is welcome because it makes the necessarily subjective choice of a confidence 
level redundant. 

We have explored generalisations of the conservative Bayesian estimator of the one-period in- 
dependent case for the one- and multi-period correlated cases and compared their estimates to 
estimates by means of upper confidence bounds and of Bayesian estimators with constrained 
and unconstrained neutral priors. We have found that a constrained neutral Bayesian estima- 
tor delivers plausible estimates and is sensitive to the presence of correlation by being situated 
between the 50% and 75% upper bounds for low correlation regimes and between the 75% and 
90% upper bounds for higher correlation regimes. Constrained neutral Bayesian estimators are 
computationally more efficient than the unconstrained neutral Bayesian estimators but good 
approximations when the constraints are carefully chosen. In particular, the (0, 0.1)-constrained 
neutral Bayesian estimator appears to be an appropriate tool for conservative long-run PD 
estimation, avoiding the issue of which confidence level to choose for the estimation. 
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A. Appendix: Output of calculation for Example 4.5 



Sun Apr 01 17:01:21 2012 
Multiperiod low default estimation 
Fictitious Default Data 

Random seed: 36 

Number of ML simulation iterations: 10000 
Number of ML simulation runs: 16 

Number of confidence bounds simulation iterations: 10000 

Number of confidence bounds simulation runs: 16 

Number of inner Bayesian simulation iterations: 1000 

Number of outer Bayesian steps: 2500 

Number of Bayesian simulation runs: 16 

Length of time period: 8 

Total number of obligor-years: 1000 

Total observed number of defaults: 1 

Naive PD estimate (bps) : 10 

Estimates with estimated correlations: 
ML estimate for PD (bps): 10.0 
Standard deviation (bps): 0.0 
ML estimate for rho (%) : 0.0 
Standard deviation (°/«) : 0.0 
ML estimate for theta ("/„): 12.4 
Standard deviation (°/«) : 4.7 

Conf. level (7„) k 50.00 & 75.00 & 90.00 & 95.00 & 99.00 k 99.90 
Upper bound (bps) k 16.8 & 26.9 k 38.8 & 47.3 k 66.2 & 92.6 
Std. dev. (bps) k 0.0& 0.0 & 0.0& 0.0 & 0.0& 0.0 

Bayesian neutral estimate for PD (bps): 20.0 
Standard deviation (bps): 0.0 

Bayesian constrained estimate for PD (bps): 19.4 
Standard deviation (bps): 0.0 

Bayesian conservative estimate for PD (bps): 20.0 
Standard deviation (bps): 0.0 

Estimates with pre-defined correlations: 
Asset correlation (7,): 18.0 
Time correlation deployed (%) : 60.0 
ML estimate for PD (bps) only: 14.1 
Standard deviation (bps): 0.1 

Conf. level CD k 50.00 & 75.00 & 90.00 & 95.00 & 99.00 k 99.90 
Upper bound (bps) k 23. 5 & 48.3 k 86.4 & 119.4 k 209.4 & 368.9 
Std. dev. (bps) k 0.3& 0.5 & 0.9& 1.1 & 2.6& 7.7 

Bayesian neutral estimate for PD (bps): 58.7 
Standard deviation (bps): 1.3 

Bayesian constrained estimate for PD (bps): 53.4 
Standard deviation (bps): 0.5 

Bayesian conservative estimate for PD (bps): 61.6 
Standard deviation (bps): 1.1 
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B. Appendix: Output of calculation for Example 4.6 



Sun Apr 01 18:38:09 2012 
Multiperiod low default estimation 
Moody's Investment Grade 

Random seed: 36 

Number of ML simulation iterations: 10000 
Number of ML simulation runs: 16 

Number of confidence bounds simulation iterations: 10000 

Number of confidence bounds simulation runs: 16 

Number of inner Bayesian simulation iterations: 1000 

Number of outer Bayesian steps: 2500 

Number of Bayesian simulation runs: 16 

Length of time period: 21 

Total number of obligor-years: 53630 

Total observed number of defaults: 54 

Naive PD estimate (bps): 10.1 

Estimates with estimated correlations: 
ML estimate for PD (bps): 17.6 
Standard deviation (bps): 2.5 
ML estimate for rho (%) : 24.3 
Standard deviation (°/ ) : 1.2 
ML estimate for theta ("/„): 58.0 
Standard deviation (°/«) : 4.6 

Conf. level (7„) k 50.00 & 75.00 & 90.00 & 95.00 & 99.00 k 99.90 
Upper bound (bps) k 14.3 & 23.6 k 35.7 & 45.2 k 69.5 & 109.5 
Std. dev. (bps) k 0.2& 0.3 & 0.3& 0.5 & 1.3& 6.2 

Bayesian neutral estimate for PD (bps): 16.6 
Standard deviation (bps): 2.2 

Bayesian constrained estimate for PD (bps): 16.5 
Standard deviation (bps): 2.2 

Bayesian conservative estimate for PD (bps): 16.6 
Standard deviation (bps): 2.2 

Estimates with pre-defined correlations: 
Asset correlation (7,): 18.0 
Time correlation deployed (%) : 60.0 
ML estimate for PD (bps) only: 11.5 
Standard deviation (bps): 1.4 

Conf. level CD k 50.00 & 75.00 & 90.00 & 95.00 & 99.00 k 99.90 
Upper bound (bps) k 12.8 & 20.0 k 29.1 & 36.2 k 52.9 & 79.7 
Std. dev. (bps) k 0.1& 0.2 & 0.2& 0.4 & 1.0& 3.7 

Bayesian neutral estimate for PD (bps): 15.6 
Standard deviation (bps): 2.3 

Bayesian constrained estimate for PD (bps): 15.6 
Standard deviation (bps): 2.3 

Bayesian conservative estimate for PD (bps): 15.6 
Standard deviation (bps): 2.3 
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